Using the Visual Denotations of Image Captions for Semantic Inference
نویسندگان
چکیده
Semantic inference is essential to natural language understanding. There are two different traditional approaches to semantic inference. The logic-based approach translates utterances into a formal meaning representation that is amenable to logical proofs. The vector-based approach maps words to vectors that are based on the contexts in which the words appear in utterances. Real-valued similarities are used in place of logical inferences. We introduce the notion of the visual denotation of an utterance, which is the set of images that it describes. This notion borrows the abstract concept of a denotation of an utterance as the set of possible worlds in which the utterance is true from the logic-based approach, and instantiates possible worlds as images. In this dissertation, we also show how visual denotations can be created for descriptions of everyday entities and events. Additionally, we demonstrate that visual denotations can be used as a new model of semantic similarity, and that this model is better at identifying entailment relations between descriptions of images than traditional distributional similarities. In order to do this, we create an image caption corpus consisting of captions and images depicting everyday actions. This corpus has a number of useful features that would assist in investigating everyday events and the different ways in which they can be described. We use the captions in the corpus as the starting point for producing caption fragments with larger visual denotations. We accomplish that by creating a denotation graph, a subsumption hierarchy over the captions that links captions and the images that depict them, that also allows for the visualization and navigation of the image caption corpus in an intuitive manner.
منابع مشابه
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituen...
متن کاملSEIMCHA: a new semantic image CAPTCHA using geometric transformations
As protection of web applications are getting more and more important every day, CAPTCHAs are facing booming attention both by users and designers. Nowadays, it is well accepted that using visual concepts enhance security and usability of CAPTCHAs. There exist few major different ideas for designing image CAPTCHAs. Some methods apply a set of modifications such as rotations to the original imag...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کاملEnhanced Sports Image Annotation and Retrieval Based Upon Semantic Analysis of Multimodal Cues
This paper presents a framework for semi-automatic annotation and semantic image retrieval, applied to the sports domain, based upon semantic analysis of both image text captions and visual features of the image. Unstructured text captions of images are analysed in order to extract the concepts and restructure them into a semantic model. SVM classification of the multi-dominant colours and edge...
متن کاملMapping between image regions and caption concepts of captioned depictive photographs
We discuss the obstacles to inference of correspondences between objects within photographic images and their counterpart concepts in descriptive captions of those images. This is important for information retrieval of photographic data since its content analysis is much arder than linguistic analysis of its captions. We argue that the key mapping is between certain caption concepts representin...
متن کامل